Goto

Collaborating Authors

 error estimate


Supplementary Material

Neural Information Processing Systems

The supplementary material is organized as follows. We give details of the definitions and notation in Section B.1 . Then, we provide the technical details of the lower bound (Lemma 3.3). In Section D.4 we provide insights into auto-labeling using This suggests, in these settings auto-labeling using active learning followed by selective classification is expected to work well. This idea is captured by the Chow's excess risk [ Nevertheless, it would be interesting future work to explore the connections between auto-labeling and active learning with abstention.







Detecting Overfitting via Adversarial Examples

Neural Information Processing Systems

The repeated community-wide reuse of test sets in popular benchmark problems raises doubts about the credibility of reported test-error rates. Verifying whether a learned model is overfitted to a test set is challenging as independent test sets drawn from the same data distribution are usually unavailable, while other test sets may introduce a distribution shift. We propose a new hypothesis test that uses only the original test data to detect overfitting. It utilizes a new unbiased error estimate that is based on adversarial examples generated from the test data and importance weighting. Overfitting is detected if this error estimate is sufficiently different from the original test error rate. We develop a specialized variant of our test for multiclass image classification, and apply it to testing overfitting of recent models to the popular ImageNet benchmark. Our method correctly indicates overfitting of the trained model to the training set, but is not able to detect any overfitting to the test set, in line with other recent work on this topic.


Characterizing Out-of-Distribution Error via Optimal Transport

Neural Information Processing Systems

Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models,so methods of predicting a model's performance on OOD data without labels are important for machine learning safety.While a number of methods have been proposed by prior work, they often underestimate the actual error, sometimes by a large margin, which greatly impacts their applicability to real tasks. In this work, we identify, or the difference between the predicted and true OOD label distributions, as a key indicator of this underestimation. Based on this observation, we introduce a novel method for estimating model performance by leveraging optimal transport theory, Confidence Optimal Transport (COT), and show that it provably provides more robust error estimates in the presence of pseudo-label shift. Additionally, we introduce an empirically-motivated variant of COT, Confidence Optimal Transport with Thresholding (COTT), which applies thresholding to the individual transport costs and further improves the accuracy of COT's error estimates. We evaluate COT and COTT on a variety of standard benchmarks that induce various types of distribution shift -- synthetic, novel subpopulation, and natural -- and show that our approaches significantly outperform existing state-of-the-art methods with up to 3x lower prediction errors.


Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

Shang, Jing, Bannon, James, Haibe-Kains, Benjamin, Tibshirani, Robert

arXiv.org Machine Learning

Tree-based methods are a family of non-parametric approaches in supervised learning. Random forests use a form of bootstrap aggregation, or bagging, to combine a large collection of trees and produce a final prediction. In regression problems, it gives the same weight to each tree and computes the average out-of-bag prediction. In classification problems, it assigns class labels by majority vote. However, since a single-tree model is known to have high variance, a large number of trees need to be trained and aggregated in order to reduce variance (Hastie et al. 2009). This can lead to redundant trees, as the bootstrap procedure may select similar sets of samples to train different trees. Moreover, increasing the number of trees does not reduce the bias. Post-selection boosting random forests, proposed by Wang & Wang (2021), is an attempt to reduce bias by applying Lasso regression (Tibshirani 1996) on the predictions from each individual tree. The method returns a sparser forest with fewer trees, as well as different weights assigned to each individual tree.


A Notation and preliminaries

Neural Information Processing Systems

A.1 Overview of used notation Table 1: Glossary of used notation. We recall some basic results on the approximation of functions by tanh neural networks in this section. Using the notation of the proof of Theorem 3.5 ( SM B.2), it holds that D Section 3.3 and SM A.2, let ( s, 0) ( s, 0) ( s, 0) ( s, 0) ( s, 0) This is made exact in [15, Section 4]. We now highlight the main steps in the proof. ", (B.18) Putting everything together, we find that if CN This is a consequence of [38, Theorem 36] and Lemma D.1 with " N See SM A.2 for an overview of the notation for finite difference operators.